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Abstract 

The study of quantum circuits composed of commuting gates is particularly useful to 
understand the delicate boundary between quantum and classical computation. Indeed, 



o 



while being a restricted class, commuting circuits exhibit genuine quantum effects such 
as entanglement. In this paper we show that the computational power of commuting cir- 
cuits exhibits a surprisingly rich structure. First we show that every 2-local commuting 
i-£h . circuit acting on d-level systems and followed by single-qudit measurements can be effi- 

ciently simulated classically with high accuracy. In contrast, we prove that such strong 
simulations are hard for 3-local circuits. Using sampling methods we further show that 
all commuting circuits composed of exponentiated Pauli operators e l9P can be simulated 
efficiently classically when followed by single-qubit measurements. Finally, we show that 
commuting circuits can efficiently simulate certain non-commutative processes, related 
in particular to constant-depth quantum circuits. This gives evidence that the power of 
commuting circuits goes beyond classical computation. 

> 

o 

; 1 Introduction 

in 

Since the discovery of Shor's factoring algorithm [T], the question whether quantum comput- 
ers possess exponentially more power then classical computers has been one of the central 
£Nj ■ problems in the field. Similar to other notorious problems in computational complexity 

theory, this question is very difficult. For example, a proof that P ^ BQP would imply that 
P PSPACE, which is a longstanding open problem. A useful approach to gain insight 
into the relationship between quantum and classical computing power is to study restricted 
classes of quantum circuits and analyze their power. For several restricted but nontrivial 
classes of quantum circuits, it has been found that efficient classical simulations are pos- 
sible. For instance, if in each step of a quantum circuit the entanglement (quantified by 
the p-blockedness [2j or by the Schmidt rank (3j) is bounded, the circuit can be simulated 
efficiently classically. Such results demonstrate that certain types of entanglement must be 
generated in sufficiently large amounts if a quantum algorithm is to yield an exponential 
speed-up. Certain other circuit classes can be simulated classically using entirely different 
arguments not based on entanglement considerations [4--9J, e.g. by using the Pauli stabilizer 
formalism [4j or the framework of matchgate tensors [SHZ] • 

Conversely, it has been shown that some restricted quantum computation schemes can 
perform tasks that appear to be hard classically |10H12j . For example, in [10] the hard- 
ness of simulating linear optical quantum computation was discussed. In |11| it was shown 
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that simulating the output probability distribution of commuting quantum circuits would 
imply a collapse of the polynomial hierarchy and is thus highly unlikely. Besides theoret- 
ical importance, these results also lower the threshold to demonstrate nontrivial quantum 
computation in experiments. 

In this paper we focus on commuting quantum circuits. Several features make such 
circuits interesting. For example, commuting circuits exhibit genuine quantum effects, e.g. 
they can generate highly entangled states (such as cluster states [13J). Further, since com- 
muting operations can be performed simultaneously, there is no time order in the compu- 
tation, which is drastically different from other computational models. Moreover, all gates 
in the circuit can be diagonalized simultaneously. The latter property might at first sight 
suggest an intrinsic simplicity of this circuit class; however it is important to note that 
the diagonalizing unitary can by a complex entangling operator. In [11] as well as in the 
present paper evidence is given that commuting circuits indeed have nontrivial power be- 
yond classical computation. It is also interesting to note that commuting operations have 
recently caught attention in different areas as well, such as the study of local Hamiltonian 
problem [T4HT7] . 

Compared to earlier work |ll t ll8| , [T9] which considered commuting gates that can be 
diagonalized in a local basis, we will consider general commuting gates acting on d- level 
systems. We will show that the computational power of commuting circuits exhibits a sur- 
prisingly rich structure. For example, the degree of hardness varies significantly depending 
on whether the gates are 2-local or 3-local. This indicates that commuting quantum circuits 
might serve as a interesting intermediate class between classical and universal quantum 
circuits. 

Our main results can be summarized as follows (here the terms "strongly" and "weakly" 
specify different notions of classical simulation, to be defined below): 

• 2-local circuits are easy. All uniform families of commuting circuits consisting of 2- 
local gates acting on d-level systems and followed by a single measurement can be 
strongly simulated by classical computation, for every d. 

• 3-local circuits are hard. Uniform families of commuting circuits consisting of 3-local 
gates acting on qubit systems and followed by a single measurement cannot be strongly 
simulated by classical computation, unless every problem in #P has a polynomial-time 
classical algorithm. 

• Commuting Pauli circuits. All uniform families of commuting circuits consisting of 
exponentiated Pauli operators e iep and followed by a single-qubit measurement can be 
efficiently simulated classically weakly. Furthermore, even when such circuits display 
a small degree of non-commutativity, an efficient classical simulation remains possible. 

• Mapping non-commuting circuits to commuting circuits. Certain non-commuting 
quantum processes (related to bounded-depth circuits) can be efficiently simulated by 
purely commuting quantum circuits. 

Finally, it is noteworthy that several distinct techniques were used to prove the above 
results, including tensor network methods, sampling methods as well as the Pauli stabilizer 
formalism. This is also an illustration of the rich structure displayed by commuting quantum 
circuits. 
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2 Preliminaries 



2.1 Commuting quantum circuits 

The quantum circuits considered in this work will always be unitary. The size of a circuit is 
the number of gates of which it consists. A d-level Hilbert space will sometimes generically 
be called a "qudit" . For an operator A acts on a system of n qudits labeled by 1 • • • n, the 
support of A is the subset of qudits on which it acts nontrivially. A quantum circuit acting 
on n qudits is said to be /c-local if the support of each of its gates contains at most k qudits. 
A family of n-qubit quantum circuits C n has polynomial size m if m scales polynomially 
with n, denoted by m = poly(n). 

A commuting circuit is a quantum circuit consisting of pairwise commuting gates. A 
k- local commuting circuit is in standard form if for every subset S C {1, . . . ,re} consisting 
of k qudits there is at most one gate Gi with supp(Gj) C S. A /c-local commuting circuit 
C = G m ■ ■ ■ G\ in standard form contains at most (^) gates, so that the size of the circuit 
scales polynomially with n if k is constant. For example, a two-local commuting circuit is 
in standard form if for every i,j = 1 • • • n with i < j there is at most one gate in the circuit 
with support contained in {i,j}; such circuit has size 0(n 2 ). Every /c-local commuting 
circuit can be brought into standard form by replacing all gates in the circuit with support 
contained in S by a single gate given by the total product of these gates, for every subset 
S consisting of k qudits. Furthermore if k is constant and if the original circuit has size m 
then this procedure to bring a circuit in normal form can be carried out efficiently i.e. in 
poly(n, m) steps. 

A simple example of a commuting circuit is C = G m ■ ■ ■ G% with gates 

d = U ®UDiU ] ®U\ (1) 

where U is a fixed single-qudit unitary operator (independent of i) and where each D{ is 
a diagonal unitary operator. In other words each gate is diagonal in the same local basis. 
This class of commuting circuits has been considered in |liyi8j. 

By their commutativity, all gates in any commuting circuit can be diagonalized simulta- 
neously i.e. there exists a unitary operator V such that VGiV^ is diagonal for every gate Gi. 
In the example ([I]), the diagonalizing operator is a simple tensor product V = U <8> • • • <g) U. 
This example does however not represent the most general situation since V may be a global, 
entangling operation — even when the commuting circuit is /c-local with k constant. Con- 
sider for example an n-qubit 3- local circuit with gates Gj = e i j where the commuting 
operators Kj are defined as follows: 

Ki = X 1 Z 2 , Ki = Zi-iXiZi, K n = Z n _iX n , with % = 2, . . . n - 1. (2) 

Here Zi and Xi denote the Pauli X and Z operators acting on qubit i. The operators Kj 
are the well known stabilizers of the ID cluster state. Let H denote the Hadamard gate 
and let CZ = diag(l, 1, 1, — 1) denote the controlled-Z gate. It is then easily verified that 
the entangling operation 

n-l 

V = H® n Y[ CZ iii+1 (3) 
i=i 

sends Kj — > VKjV^ = Zj and thus simultaneously diagonalizes the Kj. Furthermore it can 
be shown that no tensor product of single-qubit operations can perform such a diagonaliza- 
tion . 
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The example ([2]) shows that there exist fc-local commuting circuits where the diagonal- 
izing unitary V is a global, entangling operator. Nevertheless this example is still rather 
well-behaved as V can be computed efficiently and moreover has a relatively simple struc- 
ture. In fact in section [5] we will investigate commuting circuits composed of exponentiated 
Pauli operators e %ep in more detail and show that such circuits have efficient classical simu- 
lations (relative to certain measurements). For general fc-local commuting circuits, however, 
the unitary V may have a complex structure and be computationally difficult to determine. 
This feature is in part responsible for the complexity of commuting quantum circuits. 

2.2 Classical simulations of quantum circuits 

There are several valid notions of efficient classical simulations of quantum circuits. Two 
notions will be considered in this work viz. strong and weak simulations. Their main 
difference lies in the accuracy achieved in the classical simulation: roughly speaking, strong 
simulations achieve an exponential precision whereas weak simulations achieve polynomial 
precision. We mainly follow the definitions of 

Consider a uniform family of k- local n-qubit quantum circuits C n for some constant k. 
The input states are standard basis states. The circuits are followed by measurement of the 
Pauli observable Z on the first qubit. The expectation value is denoted by {Z\). 

Strong simulations. We say that C n can be efficiently simulated classically in the strong 
sense if there exists a classical algorithm with runtime poly(n, log |) which outputs a number 
E such that 

\E-(Z 1 )\<e. (4) 

Thus a strong simulation algorithm achieves an exponential accuracy e = 2~ poly( - n ) in poly(n) 
time. 

Weak simulations. We say that C n can be efficiently simulated classically in the weak sense 
if there exists a classical algorithm with runtime poly(n, j) which outputs a number E sat- 
isfying @. Thus a weak simulation algorithm achieves polynomial accuracy e = l/poly(n) 
in polynomial time. We will often allow weak simulations to fail with an exponentially small 
probability. In this sense, we say that C n can be efficiently simulated classically in the weak 
sense if there exists a probabilistic classical algorithm with runtime poly(n, ~, log j^z) which 
outputs a number E satisfying Q with probability p. Thus for polynomial accuracies and 
for success probabilities which are exponentially (in n) close to 1, the classical simulation 
runs in poly(n) time. 

The motivation for the definition of a weak simulation originates from the fact the polynomial 
error scaling e = l/poly(n) captures how accurately the expectation value {Z\) can be 
estimated by running the quantum circuit C n polynomially many times. See section 12.31 
below and [8] for a more extensive discussion. 

The above definitions can readily be generalized to take into account more general inputs 
(e.g. arbitrary product states) and measurements (e.g. arbitrary single-qubit observables) 
as well as e?-level systems. Finally, note that we will often use the term "simulation" as 
shorthand for "efficient classical simulation". 
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2.3 Chernoff-Hoeffding bound 

The Chernoff-Hoeffding bound is a tool to bound how accurately the expectation value of a 
random variable may be approximated using of "sample averages". Let X±, . . . Xk be i.i.d. 
real- valued random variables with E := KXi and Xi G [—1,1] for every i = 1, . . . , K. Then 
the Chernoff-Hoeffding bound asserts that 



Prob 



< e } > 1 - 2e~~. (5) 



For complex- valued Xi a similar bound can be obtained for \X{\ < 1. 

As an illustration, consider an n-qubit quantum circuit family C n followed by measure- 
ment of Z\ as in section 1231 Suppose that the circuit is run K times, yielding an outcome 
Zi G {1,-1} in each run. Using ([5]) one shows that the number E := Zi]/K, where the 
sum is over alH = 1 • • • K, satisfies \E — (Z\)\ < e with probability p > 1 — 2e~ Ke / 4 . Conse- 
quently, for any e = l/poly(n) there exists a suitable K = poly(n) such that \E — {Z\)\ < e 
holds with probability p exponentially close to 1. In other words, the above procedure allows 
to achieve a polynomial approximation of {Z\) in polynomial time with exponentially small 
probability of failure. This performance of the quantum computation corresponds precisely 
to the performance required of weak classical simulations, cf. section [ 



3 2-Local commuting circuits are easy 

Here we consider 2-local commuting circuits acting on general d-level systems. The main 
conclusion of this section will be that such circuits, when followed by single-qudit measure- 
ments, cannot outperform classical computation. In fact we will show that their power is 
even strictly contained in P and give a concrete example of a simple function which cannot 
be computed with such commuting circuits. 

3.1 Efficient strong simulation of one qudit 

Theorem 1. (Strong simulations of 2-local commuting circuits) Let C be a uniform 
family of 2-local n-qudit commuting circuits, acting on a product input state and followed by 
measurement of an observable O acting on qudit i for some i. Any such computation can 
be efficiently simulated classically in the strong sense. 

Proof. We prove the result for i = 1; other i are treated fully analogously. Denote the input 
by \a) = \ai) ■ ■ ■ \a n ) where each \ai) is a single-qudit state. We can assume without loss 
of generality that C = Y\ Ujk is in standard form, where Ujk represents the unique gate in 
the circuit with support S C {j, A;}, for every j, k = 1 • • • n and j < k. If Ujk does not act 
on qudit 1, then this gate commutes with O. Hence in the product C^OC we can commute 
Ujk through C and O to the left until it cancels out with uj k . By doing so, we can remove 
all gates that do not act on qudit 1. Therefore the expectation value of O is 

(O) = (a\tfOC\a) = (aKHu^OHu^a) (6) 

where the products are over all j > 2. Now our strategy will be to trace out qudits one by 
one in the above equation. Denote \a^) = \a), = C and = O. Furthermore for 
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every k = 2, . . . , n — 1 define 

\ai)\a k+1 ) ■ ■ ■ \a n ) 
Uik+i • • • Ui n 

[I®(oik\] U\ k O^U lk [I®\a k )\. (7) 

Remark that each O" acts on a single qudit (namely qudit 1). Furthermore each of these 
operators can be computed classically with exponential precision in polynomial time: O^ 1 ' 
is given as an input and each update from O^' to 0( J+1 ) involves simple multiplications of 
2-qudit operations which can be done in constant time (taking 0(d e ) steps where d denotes 
the dimension of one qudit). 

With the above definitions one finds, for every k = 2, . . . , n — 1: 

(a^liC^-^O^C^la^) = (a^\[C^O^C^\a^). (8) 

Using this equation iteratively, we get 

(O) = (aW\[C^0McU\aW) = ■■■ = (a^\u\ n O^U ln \a^). (9) 

The last expression is easily computed since \a t - n ~ 1 ^) is a 2-qudit state and U\ n and 0( ra_1 ) 
act on at most 2 qudits. □ 

The above result can readily be generalized in different ways. First, using a similar 
argument one shows that measurement of any observable acting on O(logn) qudits can be 
strongly simulated as well. Furthermore, interestingly, the result also generalizes to mutually 
anticommuting gates, and more generally to gates which commute "up to a phase" as follows. 
Let C = Y\ Gi be a uniform family of 2-local n-qudit circuits such that GiGj = ^ijGjGi for 
all pairs of gates, where the "fij are complex phases. Input and measurement are as in 
theorem [TJ Then such circuits can be efficiently simulated classically in the strong sense. 
Analogous to the first step in the proof of theorem (H the proof starts by "removing" all 
gates which do not act on qudit i from the product C^OC by commuting them through the 
circuit. This introduces an (easily computed) product of phases 7ij. The remainder of the 
proof of theorem [T] carries over straightforwardly. 

3.2 2-local commuting circuits cannot compute all functions in P 

Here we show that two-local commuting circuits are not universal for classical computation 
by giving an explicit example of a function which is not computable with such circuits. 

For every d we let denote the set of integers modulo d. Let C denote a two-local 
commuting circuit acting on m d- level systems. Consider a function / : Z*j — > Z^. We say 
that C computes / with probability at least p if the circuit C acing on \x, 0) (where denotes 
a string of m — k zeroes) and followed by a standard basis measurement of the first qudit 
yields the outcome f(x) with probability at least p. 

We will in particular consider the "inner product function" /; p : Z^ n — > Z^ defined by 

f ip (x a ,x b ) = (x a fx b modd, (10) 

for every x a ,x b E TI\. 



^) = 
C (k) = 

(fc) = 
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Lemma 1. Let o~i,...o~jsr be a collection of d x d density operators. For any e > 0, if 
N > (|) , then there exists two operators aj and o~k such that \\aj — ak\\tr < where 
\\A\\tr= htrV At A denotes the trace distance. 

Proof. We will show for any e > 0, there exists a finite set E of d x d density operators, such 
that for every density operator p, there exists a 6 E with \\p — er|| tr < e (we call E a e-net). 
To do this, first we recall that every density operator of dimension d has a purification by 
introducing an ancillary d-dimensional space R . And in [20], it was shown that for pure 

states of dimension d 2 , there exists a e-net F with cardinality |F| < (^) 2d = M. We can 
then choose set E to be tr^F, which is the partial trace of each element of set F. Since 
partial trace is a contractive operation [21], i.e. \\trn{p — r)||tr < 1 1 /x — T|| tr , we know that 
set E obtained this way is indeed an e-net. 

Note that |E| = |F| = M. Now if there are more then M density operators, then there 
must be two density operators o~j, cr/% that are e-close to the same element of E. Thus by 
triangle inequality \\o~j — o~k\\tr < 2e. The proof can be finished by a rescaling of e. □ 

Theorem 2. Consider an arbitrary d and an arbitrary constant p > 1/2. For sufficiently 
large n, the inner product function fi p is not computable by any two-local commuting circuit. 

Proof. Suppose there exists an m-qudit quantum circuit C, for some m > 2n, which com- 
putes / with probability p > 1/2. We show that this leads to a contradiction. Repeating 
the argument of theorem [T] we can remove all gates from the circuit which do not act on 
qudit 1. We denote this simplified circuit again by C. Now write C = Ct,C a , where C a consists 
of all gates in the circuit acting on qudits {l,i} with i = 1 . . .n and where Cb consists of 
all gates acting on qudits {1, j} with i = n+ l...m. Furthermore, let x = (x a ,x b ) be an 
arbitrary input of /. Finally, denote 

a(x a ) := Tr n ... 2 C a \x a )(x a \Cl (11) 

which is the reduced density operator for qudit 1 of the state C a \x a ). 

The final state of the entire circuit is C|x,0) where denotes a string of m — n zeroes. 
With the notations above, the reduced density operator of the first qudit is 

p(x a ,x b ) := Tr m ... 2 C\x,0)(x,0\tf = Tv m ... n+1 Tr n ... 2 C b C a \x,0}(x,0\Clcl 

= Tr m ... n+ i C b [a{x a )® \x b , 0) (x b , 0\} Cl (12) 

We now use lemma [TJ This implies for every e > there exists an n sufficiently large and 
two n-tuples x a / y a such that ||c(j; a ) — a(y a ) || tr < e. Using (fT2|) and the fact that the 
trace norm is contractive, it follows that \\p(x a ,x b ) — a(y a ,x b )\\t r < e for every n-tuple x b \ 
This implies the following: if a standard basis measurement on p(x a , x b ) yield some outcome 
u with probability p(u), then standard basis measurement on p(y a ,x b ) will yield the same 
outcome with probability q(u) where \p(u) — q(u)\ < e. Setting e = p — \ and using that 
C computes / with probability at least p, it then follows that f(x a ,x b ) = f(y a ,x b ) for all 
x b . Using the definition of /, this straightforwardly implies that x a = y a , thus leading to a 
contradiction. □ 
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4 3-Local commuting circuits are hard 



Next we show that strong simulations of 3- local commuting circuits are unlikely to exist. 

Theorem 3 (Hardness of simulating 3-local commuting circuits). LetC be a uniform 
family of n-qubit 3-local commuting quantum circuits acting on the input |0) and followed by 
Z measurement of the first qubit. If all such circuits could be efficiently simulated classically 
in the strong sense then every problem in f^P has a polynomial time algorithm. 

In other words, there is a drastic increase in complexity in the seemingly innocuous tran- 
sition from 2-local to 3-local gates. Remark that hardness already holds for the simplest case 
i.e. qubit systems — even though d-level 2-local commuting circuits have efficient simulations 
for any d. Hardness of strong simulations does not necessarily imply that weak simulations 
are hard as well since strong and weak simulations are generally inequivalent concepts (cf. [8] 
for a discussion) . In section [6] we will provide evidence that /c-local commuting circuits with 
constant k can efficiently perform certain tasks that appear to be nontrivial for classical 
computers, thereby providing evidence that efficient weak simulations might not exist in 
general. 

The proof of theorem [3] is given below. Our approach is to relate simulations of 3- 
local commuting circuits to the evaluation of matrix elements of universal unitary quantum 
circuits, which is known to be hard. The following three lemmata collect preliminary results. 
First we recall that the evaluation of matrix elements of universal quantum circuits is known 
to be hard. We denote S := diag(l, e in / 4 ) and CZ := diag(l, 1, 1, -1). 

Lemma 2. Let U be a uniform family of n-qubit quantum circuits composed of the gates 
H, S and CZ. If there existed an algorithm with runtime poly(n, log -) which outputs an 
e- approximation of (0|W|0) for any such circuit family, then every problem in j^P has a 
polynomial-time algorithm. 

Proof. Consider an efficiently computable Boolean function / : {0, l} n — > {0, 1}. Let s(f) 
denote the number of bit strings x satisfying f{x) = 0. The problem of computing s(f) is 
well known to #P-complete. Now define the (n + l)-qubit state |/) := 2~ n l 2 \ x ,f( x )) 
where the sum is over all n-bit strings x. Let T~L be the operator which acts as H on qubits 
1 to n and as the identity on qubit n + 1 . Then an easy calculation shows 



Since H, CZ and S form a universal gate set, the Solovay-Kitaev theorem implies that there 
exists a uniform circuit family V composed of these gates such that V|0) is 5-close to |/) 
with 5 := 2~ n2 . Denote the circuit U := UV. Using flU]) it follows that 



Now suppose that there exists a poly(n,log^) classical algorithm to compute (0|£/|0) with 
accuracy e. Setting e = 5, this would imply the existence of a polynomial time classical algo- 
rithm that outputs an ^-approximation 7 of (0|£Y|0). Using (|14p and the triangle inequality 
this implies that 7 approximates s(/)/2™ with accuracy 25. Since s(/)/2 n = k/2 n for some 
integer between and 2 n , this accuracy would allow to compute s(/) exactly in polynomial 
time, hence implying that every problem in #P has a poly-time algorithm. □ 



(0\H\f) = s{f)/2 n . 



(13) 



|(0|W|0) 




< s. 



(14) 
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Figure 1: The Hadamard test 

Second, we recall a result from [11] which relates universal quantum circuits to post- 
selected 2- local commuting circuits. 

Lemma 3. Let ti be an n-qubit quantum circuit composed of the gates H , S and CZ and 
denote = U\Q) n . Then there exists a 2-local commuting circuit C on k + n qubits such 
that is obtained by postselecting C\0) k+n on the first k qubits; more precisely 

|0> fc |V> = V2 k VC\0) k+n . (15) 

Here V denotes the projector |0)(0| acting on the first k qubits. Furthermore k = poly{n) 
and the description of C can be computed efficiently on input of the description oflA. 

Combining the above two lemmata shows that approximating matrix elements of com- 
muting 2-local circuits is hard. 

Lemma 4. LetC be a uniform family of n-qubit 2-local commuting quantum circuits. If there 
existed a classical algorithm with runtime poly(n, log -) which outputs an e- approximation 
of (0|C|0) for any such C, then every problem in #P has a poly-time algorithm. 

Proof. Let IA be a uniform family of n-qubit quantum circuits composed of the gates H, S 
and CZ and let C be the associated commuting circuit family as in lemma O Using (115|) 
one finds 

(0| n «|0) n = y^ fe (0| n+/l C|0) n+fe . (16) 

If an efficient classical algorithm existed to estimate (0|C|0) with exponential precision, then 
there also exists an algorithm to estimate (0|W|0) with exponential precision. This implies 
that every problem in j^P has a poly-time algorithm owing to lemma [2j □ 

The proof of theorem [3] now proceeds by relating the simulation of 3-local commuting 
circuits to the evaluation of matrix elements of 2-local commuting circuits, via the Hadamard 
test. 

Proof of theorem [3l Suppose that an efficient algorithm existed to strongly simulate 
the circuits described in the theorem. Consider an arbitrary n-qubit commuting circuit 
C = Gt - G\ with two-qubit gates G{. Consider the following (n + l)-qubit quantum 
circuit (the "Hadamard test") with input |0) as depicted in Fig. [TJ First H is applied to 
the first qubit. Then each gate G{ is applied controlled on the first qubit being in the state 
|1); we denote these 3-qubit gates by CG{. Finally, H is again applied to the first qubit. 
Measuring the first qubit yields the outcome with probability 

p(0) = ^(l+ Re((0|C|0»). (17) 

Now for each % define the 3-qubit gate Ui := [H g) I]CGi[H (8) I], where H acts on the first 
qubit, and let C denote the circuit composed of the gates Ui. Since the gates Gi commute, 
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also the gates Ui commute. Furthermore, it is straightforward to show that the circuit C 
acting on |0) and followed by measurement of the first qubit is equivalent to the circuit 
in Fig. [H since the hadamard operations "in the middle" cancel out. Thus C also yields 
the outcome with probability p(0). It follows that the existence of an efficient classical 
algorithm to strongly simulate the circuit C yields an efficient classical algorithm to compute 
the real part of (0|C|0) with exponential precision. Replacing the second Hadamard gate in 
Fig. [I] by PH where P = diag(l, i) and arguing analogously yields an efficient algorithm 
to estimate the imaginary part of (0|C|0) with exponential precision. Using lemma [3] we 
conclude that this would imply that every problem in #P has a poly-time algorithm. □ 



5 Efficient simulation of commuting Pauli Circuits 

A circuit composed of unitary operators of the form e ldP , where the Ps are (Hermitian) Pauli 
operators, is called a Pauli circuit. Recall that every two Pauli operators either commute or 
anticommute. Pauli circuits are easily seen to be universal for quantum computation. Here 
we investigate commuting Pauli circuits. We allow P to act on arbitrarily many qubits i.e. 
we do not restrict to local gates3- 

Given the distinguished status of Pauli operators, commuting Pauli circuits constitute a 
simple and natural class of commuting quantum circuits. This class in fact encompasses the 
model of "instantaneous quantum computation" (IQP) introduced in [18]. IQP corresponds 
to the subclass of commuting Pauli circuits where each P is restricted to be a tensor product 
of identities and Pauli X matrices, so that every gate e ldP is diagonalized by the tensor 
product operator H&>- ■ -®H. Generalizing IQP to arbitrary commuting Pauli circuits adds 
the interesting feature that the unitary operator which simultaneously diagonalizes the gates 
in the circuit is generally no longer a tensor product of single-qubit operators, but rather a 
global entangling operation; see example ([2|). 

Whereas arbitrary Pauli circuits are universal, we will show that commuting Pauli cir- 
cuits can be efficiently simulated classically in the following sense. 

Theorem 4. (Weak simulation of Commuting Pauli circuits) Every uniform family 
of commuting Pauli circuits acting on a standard basis input and followed by measurement 
of Z acting on one of the qubits can be weakly simulated classically. 

It was shown in [TT] that IQP circuits followed by single-qubit standard basis mea- 
surements can be simulated efficiently weakljU. Theorem [4] hence generalizes this result to 
arbitrary commuting Pauli circuits. Furthermore, in [11] it was shown that efficient weak 
classical simulation (relative to a certain special type of approximations viz. multiplicative 
approximations) of 2-local IQP circuits followed by 0{n) computational basis measurements 
are highly unlikely to exist: the existence of such simulations would imply a collapse of the 
polynomial hierarchy to its third level. Thus a fortiori simulations of 0(n) computational 
basis measurements are unlikely to exist for general commuting Pauli circuits as well. 

One can in fact show a stronger version of theorem [U a general Pauli circuit containing 
a limited degree of non-commutativity can still be simulated classically efficiently. 

1 Remark that, even for such non-local gates, every gate e' 0P can be efficiently implemented on a quantum 
computer i.e. it can be realized by a polynomial size quantum circuit of elementary gates 

2 In fact classical simulations were also achieved in [11] for O(logn) measurements; our results can also be 
generalized to simulate such measurements for arbitrary commuting Pauli circuits. 
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Theorem 5. (Weak simulation of slightly non-commuting Pauli circuits) Consider 
a uniform family ofn-qubit commuting Pauli circuits interspersed with O(logn) gates of the 
form e lS ® with Q an arbitrary (Hermitian) Pauli operator. Any such circuit family acting 
on standard basis input and followed by measurement of Z acting on one of the qubits can 
be weakly simulated classically. 

The proofs of theorem [4] and [5] are given in section 15.31 In the preceding sections we 
develop the necessary tools. It is interesting that the simulation techniques used here are 
completely different from those used our simulations of 2-local commuting circuits (theorem 
[T|). In particular the latter involved strong simulations whereas commuting Pauli circuits 
will be simulated using weak simulations combined with stabilizer methods. 

5.1 Pauli and Clifford operators 

A Pauli operator on n qubits has the form P = aP\ <8> • • • <8> P n , where a G {±1, ±z} and 
where each Pj is one of the Pauli matrices X, Y, Z or the identity. A Pauli operator 
is said to be of Z-type if each Pj is either Z or the identity; X-type Pauli operators are 
defined analogously. Since X, Y and Z are Hermitian, a Pauli operator is Hermitian if and 
only if a £ {1,-1}. Letting Zf. and X k denote the operators Z and X acting on qubit k, 
respectively, it can be verified that every Pauli operator P can be written as 

P = i t Y[x% h Z b k k , where* € {0,1,2,3}, a k ,b k € {0,1}. (18) 

k 

Defining the 2n-dimensional bit string 

r(P) = (oi, • ■ • ,a n ,bi, ■ ■ ■ ,b n ), (19) 

it is easily verified that r{PQ) = r{P)+r{Q) for all Pauli operators P and Q, where addition 
is modulo 2. 

An n-qubit operator U is a Clifford operation if UPU* is a Pauli operator for every Pauli 
operator P. The set of all n-qubit Clifford operations is a group, called the Clifford group. 
A Clifford circuit is a quantum circuit composed of H, CNOT and P = diag(l,i). It is 
well known that every Clifford circuit realizes a Clifford operator, and that every Clifford 
operator can be realized as a (polynomial-size) Clifford circuit. 

Lemma 5. Let Pi, ... , P m be a collection of commuting n-qubit Pauli operators. Then there 
exists a Clifford operation C such that C^PiC = Qi for every i, where each Qi is a Z-type 
Pauli operator. Moreover each Qj as well as the description of a poly-size Clifford circuit 
realizing C can be determined efficiently. 

Proof. It suffices to prove the result for Hermitian Pauli operators since every Pauli operator 
can be made Hermitian by providing it with a suitable overall phase. Thus henceforth we 
assume that the Pi are Hermitian. We can write all m vectors r(Pj) in a m x 2n matrix 
and pick out a maximal set of independent row vectors over Z2 efficiently by Gaussian 
elimination. W.l.o.g. we assume these are the first I vectors. The corresponding Pauli 
operators {Pi, . . . , p} =: S form an independent set i.e. no operator in S can be written as 
a product of the other elements of S. In addition, no product of operators in S yields —I. 
Indeed suppose there exist bits Xj, not all zero, such that Pj^ 1 . . . Pf = —I. This would 
imply that Yl Xjr(Pj) = 0, contradicting with the linear independence of the r(Pj). Since 



11 



the operators in S are Hermitian, independent and commuting and since no product of some 
of these operators yields —I, there exists a stabilizer code V of dimension 2 n ~ l stabilized by 
S [21]. This implies in particular that I < n. Using standard stabilizer techniques one can 
efficiently compute additional Hermitian Pauli operators S' = {Ri + i, . . . ,R n } such that all 
operators in the set T = SUS' mutually commute, are independent and no product of these 
operators yields —I [21j . These n operators are the stabilizers of a 1-dimensional stabilizer 
code i.e. a stabilizer state \ip). In other words \ip) satisfies Pi\ip) = = Rj\ip) for every 
i = 1, . . . , Z and j = I + 1, . . . , n, and moreover it is the unique state doing so. It is well 
known that there exists a poly-size n-qubit Clifford circuit C such that \ip) = 7C|0) n for some 
global phase 7; moreover a description of C can be computed efficiently [22J. Now define 
Qi = C^PiC for every i = 1, . . . , m. Each Qi is an efficiently computable Pauli operator 
since C is a poly-size Clifford circuit. Since C|0) = |V>) and Pj\t/)} = \ip) for every Pj G S one 
has Qj\0) = |0). This last property together with the fact that each Qj is a Pauli operator 
implies that Qj must be of Z-type. Finally, since each Pk with k > I + 1 can be written, 
up to a global phase, as a product of operators within S and since products of Z-type Pauli 
operators are again of Z-type, it follows that also Qk is of Z-type. □ 

5.2 CT states 

Here we recall a result (theorem [6] below) stating that a general class of quantum processes 
can be simulated weakly. First we need some definitions. Consider a family of n-qubit 
states \ij) n ) = specified in terms of some classical description, say a quantum circuit 
preparing from the state |0). Following [8], \ip) is said to be computationally tractable 
( CT) (relative to this description) if 

(a) it is possible to sample in poly(n) time with classical means from the probability 
distribution Prob(x) = |(x|^)| 2 on the set of n-bit strings x, and 

(b) for any bit string x, the coefficient {x\ip) can be computed in poly(n) time on a classical 
computer with exponential precision. 

Second, an n-qubit unitary operator U is said to be monomial if there exists a permutation 
7r on the set of n-bit strings and a family of complex phases X x , such that 

U\x) = \ x \n{x)) for every \x). (20) 

In other words U maps each standard basis state to another one, up to a global phase. 
Equivalently, one has U = PD where D = ^\ x \x){x\ is a diagonal matrix and P = 
|7r(x))(x| is a permutation matrix. The operation U is said to be efficiently computable 
if the functions 

x X x , x — > tt(x) and x 7r _1 (x) (21) 

can be computed efficiently. 

In our simulation of commuting Pauli circuits we will use the following classical simula- 
tion result proved in [8]. 

Theorem 6. (CT states) Let \tp) and \ip) be n-qubit CT states and let U be an n-qubit 
efficiently computable monomial operation. Then there exists a polynomial time classical 
algorithm to approximate {tp\U\ip) with polynomial accuracy (and exponentially small prob- 
ability of failure). 
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For our purposes, it will be relevant that every stabilizer state is CT. More precisely, 
for every polynomial-size Clifford circuit C and standard basis state \x) (where x is an n-bit 
string), the state \ip) = C\x) is CT relative to the description of C and the input x. Property 
(a) is the content of the Gottesman-Knill theorem [JJ. Property (b) was shown in |22j ; 
in fact for every stabilizer state \ip) the standard basis coefficients (y\ip) can be computed 
exactly. We refer to [8] for a more extensive discussion of CT states. 

As for monomial operators, it is easily shown using (|18p that every Pauli operator is 
unitary, monomial and efficiently computable. Second, every unitary operator of the form 
exp[i9Q], where Q is any (Hermitian) Z-type Pauli operator, is diagonal and hence mono- 
mial. Furthermore it is straightforward to show that any such operator is efficiently com- 
putable. More generally, it is useful to note (and easy to show): 

Lemma 6. IfU±, . . . , U\~ are efficiently computable monomial unitary n-qubit operators and 
k = poly(n), then also Yii=i &i * s efficiently computable monomial. 

5.3 Proof of theorem [4] 

For clarity we prove theorem [3J separately even though it is superseded by theorem^ Denote 
the input by \x) where x is an n-bit string. Denote the Pauli circuit by IA and let e % ® iPi 
denote its gates (1 < j < m). Let (Zi) denote the expectation value of Z. First we invoke 
lemmaEl yielding a Clifford circuit C satisfying C^PjC = Qj for some efficiently computable 
Hermitian Z-type operators Qj. It follows that 

gi^Pi = Ce if) J Q itf (22) 

and therefore U = CDC^ where T> is given by the product of the m diagonal operators e jO*. 
Denote P = C^ZjC which is an efficiently computable Pauli operator. Furthermore denote 
:= C' \x). Then 

{Zi} = {x\U ] ZiU\x) = (^\V^PV\^j). (23) 

Since C is a Clifford circuit, is a CT state. Finally M := T>^PT> is monomial and 
efficiently computable: indeed the Pauli operator P as well as each e"'* are efficiently 
computable monomial, as discussed in section 15.21 Applying lemma [6] then shows that M 
is efficiently computable monomial as well. Theorem [6] can now be applied. 

5.4 Proof of theorem [5] 

We assume w.l.o.g. that Z is measured on the first qubit. Let C be obtained by interspersing 
the commuting Pauli circuit C = Y[ e %ePj with k additional gates e i6 "^ J ' at arbitrary places in 
the circuit. Write 

e idQj = [ cos Q]I+[i sin 9]Qj (24) 

for ever such additional gate. Doing so, the circuit C is written as a linear combination of 
2 fc circuits (with coefficients of the form (cos 9) 1 (i sin 6) k ~ l ) , each of which being obtained by 
replacing e l6l< ^ by either / or Qj. Thus every circuit in the linear combination is obtained 
by interspersing C with k Pauli operators. Using that e l8P Q = Qe for every two Pauli 
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operators P and Q, the Qj can all be commuted to the right. As a result, we find that C 
is written in the form 

C' = a a C a T, a , (25) 

a=l 

where each coefficient a a is efficiently computable, where each E a is a Pauli operator and 
where each C a is a commuting Pauli circuit obtained by flipping a subset of the signs 
Pj —7- —Pj in the commuting circuit C. Furthermore there are only poly(n) terms in the 
sum since k = O(logn) by assumption. To arrive at an efficient weak simulation of C 
followed by measurement of Z%, it suffices to show that each of the matrix elements 

(xl^CiZiCpZfslx) (26) 

can be estimated efficiently with polynomial accuracy. First we can commute Z\ to the 
right, transforming Cp into a commuting Pauli circuit Cp obtained by changing some of 
the signs Pj — > ±Pj as before. Note that the combined circuit CaCp is a commuting Pauli 
circuit since all gates have the form e i p i . Furthermore E a |0) and ZiE«|0) are, up to 
global phases, simple standard basis states, say \y) and \z) resp., which can be computed 
efficiently. Analogous to the proof of theorem 0] we wr ite C ] a Cp = UVU ] where U is a 
polynomial size Clifford circuit and T> is a product of diagonal gates. Putting everything 
together we find that f)26f) is, up to an efficiently computable overall phase, of the form 
(y\UT>W\z) for some standard basis states \y) and \z). Since W\y) and W\z) are CT states 
(see section 15. 2\i and since T> is efficiently computable monomial, we can apply theorem [6] 
yielding an efficient classical algorithm to estimate (I26p . This proves the result. 

6 Mapping non- commuting circuits to commuting circuits 

Here we show that commuting circuits can be used to efficiently reproduce the output of 
certain non-commutative processes. These results will provide evidence that commuting 
circuits can be used to solve tasks that appear nontrivial for classical computers. 

6.1 Two-layer circuits 

For every constant k we let T k denote a computational model involving a universal classi- 
cal computer supplemented with a restricted quantum computer operating with uniformly 
generated families of A;-local commuting circuits acting on an arbitrary product input state 
and followed by Z measurement of the first qubit. By construction, T k has the power to effi- 
ciently solve every problem in the complexity class P, for every k. Our goal is to investigate 
whether r fc -computations have the potential to outperform classical computers. 

Theorem 7. (Mapping A;- local non-commuting to (k + l)-local commuting cir- 
cuits) Let C\ and C2 be uniform families of k-local n-qubit commuting circuits, where the 
gates in C\ need not commute with those in €2- Then there exists a polynomial time r fe+1 - 
algorithm which approximates (0|C]C2|0) with polynomial accuracy (with success probability 
exponentially close to 1). 
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The above result shows that the non-commutativity in the two-layer circuit C\Ci can be 
"removed" by allowing gates to act on k + 1 qubits. The proof is an immediate consequence 
of the following alternate version of the Hadamard test (which regards arbitrary, i.e. not 
necessarily commuting, circuits). 

Lemma 7. (Alternate Hadamard test) Let IA = Ui m ■■■XJ\ be an n-qubit quantum 
circuit of even size 2m. Add one extra qubit line (henceforth called qubit 1) and for every 
i = 1 • • • m define the gate 

iy i = |O)(O|0^ t m+ i- J + |l)(l|^^, (27) 

which acts on qubit 1 and the qubits on which Ui and Ui + k acted in the initial circuit IA. 
Consider the following circuitW acting on the (n+\)-qubit input |0): first, apply H to qubit 
1; second, apply the gates W\, . . . , W m ; third, apply H to qubit 1; finally measure Z on qubit 
1. Then the probability of outputting is 

p(0) = i(l + ite(0|W|0». (28) 

Analogously, replacing H in the third step by HP with P = diag(l,i) yields the imaginary 
part o/(0|W|0). 

Remark that lemma [7J requires IA to have even size. This is however not an essential 
requirement since a circuit of odd size 2m + 1 can be "padded" with an additional identity. 
This yields a circuit IA' of size m + 1. 

The proof of the lemma is obtained by directly computing p(0). Similar to the Hadamard 
test, the above result provides a simple quantum algorithm to estimate matrix elements of 
unitary quantum circuits with polynomial accuracy (and with success probability exponen- 
tially close to 1). Different from the standard Hadamard test, however, is that the size of 
the circuit IA' used in lemma [7J is half the size of the original circuit IA i.e. the alternate 
Hadamard test is "twice as fast" . The price to pay for this is that the gates in IA' act on a 
larger number of qubits: if IA is a fc-local circuit then IA' can be as much as (2k + l)-local. 

Proof of theorem [7J Without loss of generality we can assume that C\ and C2 are in 
standard form, say C\ = G m ■ ■ ■ G% and C2 = G' m ■ ■ ■ where m = (^) . By definition of the 
standard form, for every subset S of k qubits there is precisely one gate Gi and one gate G'j 
such that supp(Gj) C S and supp(G^) C S. By suitably labeling the gates in both circuits 
we can ensure that always j = m + 1 — i. Now apply lemma[7]to the circuit IA := C\Ci with 
the identification f/j := Gi and U m+ i := Gi for every i = 1 ■ ■ ■ m. Then each gate (p7|) acts 
on the qubits in S together with qubit 1 so that this gate is (k + l)-local (at most). Note 
furthermore that all gates Wi mutually commute. Finally, define W[ := [H 7]Wj[i? <£> I] 
where H acts on qubit 1. Since all H gates in the middle cancel out, the (k + l)-local 
commuting circuit C = Y\ { W[ acting on 1 0) followed by measurement of Z\ yields the same 
output as the circuit IA' of lemma [71 This allows to estimate the real part of {OIC1C2 1 0) with 
polynomial accuracy within the class r fc+1 . The imaginary part is treated analogously. □ 

6.2 Constant-depth circuits 

Here we will relate commuting circuits with constant-depth circuits comprising arbitrary 
gates. 
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Theorem 8. (Estimating constant-depth matrix elements) LetlA be a uniform family 
of n-qubit quantum circuits of constant depth m. Then there exists a polynomial time F k - 
algorithm to approximate |(0|W|0)| 2 with polynomial accuracy (and with success probability 
exponentially close to 1) where k = 2 m + 1. 

Recall that the problem of estimating matrix elements |(0|W|0)| of polynomial size quan- 
tum circuits of arbitrary depth is known to be BQP-hard (and the naturally corresponding 
decision problem is BQP-complete) . Theorem [8] shows that such matrix elements can be 
estimated efficiently with /c-local commuting circuits with constant k as long as IA has con- 
stant depth (with an exponential scaling of k with m). Although one would not expect 
the constant-depth matrix problem to be BQP-hard, this task appears to be nontrivial for 
classical computers and, to our knowledge, no efficient classical algorithm is known. 

Proof of theorem [8} Letting Zj denote the operator Z acting on qubit j, we define 
Z(S) = rij'eS for every subset S C {1, . . . , n}. Using 

|0)(0| = ^^Z(5), (29) 

where the sum is over all subsets S, one finds 

|(0|W|0)| 2 = (0|Wt|0)<0|W|0) = ^5>|WtZ(S)«|0>. (30) 

Setting Gj :=U^ZjU yields 

(0\WZ(S)U\0) = <0| n Gj\0) =: F(S) (31) 
jes 

for every subset S. Since the Zj mutually commute, the Gj mutually commute as well as 
these operators are obtained by simultaneously conjugating the Zj. Furthermore since IA has 
depth m, each Gj acts on at most 2 m qubits. Thus F(S) is a matrix element of a 2 m -local 
commuting circuit. Via the standard Hadamard test (recall Fig. Q]and the proof of theorem 
[3]) one constructs a /c-local commuting circuit with k = 2 m + 1 which allows to estimate any 
such matrix element with polynomial accuracy in polynomial time, with success probability 
exponentially close to 1. 

We now use these findings to give an efficient r fc -algorithm to estimate 7 := |(0|£Y|0)| 2 
with polynomial accuracy. Owing to (|30l) - (|3TT) . one has 7 := 2~ n ^F(5). Thus 7 equals 
the expectation value of a random variable over the collection of all 2 n subsets S which 
takes the value F(S) with uniform probability. Fix e > 0. First we generate K subsets 
S a Q {!)••• ) n } uniformly at random. Applying the Chernoff-Hoeffding bound we find 
that, for some sufficiently large K = poly(n, 1/e), one has 

< e/2 (32) 

with probability exponentially close to 1. Next, as described above we can efficiently com- 
pute an estimate f a of each F(S a ) using r fc -circuits with k = 2 m + 1; more precisely, we 
compute K numbers f a satisfying \ f a — F(S a )\ < e/2. The runtime of the computation will 
be poly(n, 1/e) and the success probability exponentially close to 1. Finally, we compute 



0=1 
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c := [J2 f a ]/K which takes poly(n, 1/e) time as well. Using (f32l) and the triangle inequality 
it follows that \c— j\ < e. Thus c is our desired polynomial approximation of 7. □ 

Finally we note that theorem[8]can be generalized in the following rather intriguing sense: 
using r fe -circuits one can also efficiently estimate matrix elements of the form |(0|£/C|0)| 2 
where U is again a constant-depth circuit and where C represents an arbitrary uniform family 
of Clifford circuits. Interestingly, these Clifford circuits need not have constant depth. The 
proof, which is given in appendix [A] uses an argument analogous to the proof of theorem [8] 
combined with the alternate Hadamard test given in lemma [3 
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A A generalization of theorem [8] 

Theorem 9. LetU be a (uniform family of) n-qubit quantum circuit(s) of depth m. LetC be 
a (uniform family of) n-qubit Clifford circuit(s). Then the problem of estimating the matrix 
element \ (0\UC\0)\ 2 with polynomial accuracy and with success probability exponentially (in 
n) close to 1 is in T k with k = 2 m + 1. 

Proof. Similar to (|30p one has 

|(0|C^|0)| 2 = ^5>|«tCtZ(5)CW|0). (33) 

Since C is Clifford, C^Z(S)C =: P is a Pauli operator which can moreover be determined 
efficiently; that we suppress dependence of P on S to simplify notation. Following f 1 1 8 [) . we 
can write 

P = i t l\X° k Z b k k , where ie {0,1,2,3}, a k ,b k e{0,l}. (34) 
Now define G k := U^X^U and H k := U^Z a k k U as well as d := f\ G k and C 2 := ]1 H k . Then 

(0\U^tfZ(S)CU\0) =i t (0\C l C 2 \0). (35) 

Since the Z k mutually commute, the G k mutually commute as well. Furthermore each Gj 
acts on at most 2 m qubits. Therefore C\ is a 2 m -local commuting circuit. Similarly, C2 is a 
2 m -local commuting circuit as well. We can now apply theorem [71 showing that {OIC1C2 1 0) 
can be estimated with polynomial accuracy using r fc -circuits with k = 2 m + 1. Continuing 
the argument as in the proof of theorem [8] completes the proof. □ 

References 

[1] P. W. Shor (1999), Polynomial-time algorithms for prime factorization and discrete 
logarithms on a quantum computer, SIAM review, vol. 41, no. 2, pp. 303-332. 

[2] R. Jozsa and N. Linden (2003), On the role of entanglement in quantum computational 
speed-up, Proc. R. Soc. A, vol. 459, pp. 2011-2032, quant-ph/0201143. 



17 



[3] G. Vidal (2003), Efficient classical simulation of slightly entangled quantum computa- 
tions, Phys. Rev. Lett., vol. 91, p. 147902, quant-ph/0301063. 

[4] D. Gottesman (1997), Stabilizer Codes and Quantum Error Correction, quant- 
ph/9705052. 

[5] L. G. Valiant (2002), Quantum Circuits That Can Be Simulated Classically in Polyno- 
mial Time, SIAM J. Comput., vol. 31, pp. 1229-1254. 

[6] E. Knill (2001), Fermionic Linear Optics and Matchgates, quant-ph/0108033. 

[7] R. Jozsa and A. Miyake (2008), Matchgates and classical simulation of quantum circuits, 
Proc. R. Soc. A, vol. 464, pp. 3089-3106, arXiv:0804.4050. 

[8] M. Van den Nest (2010), Simulating quantum computers with probabilistic methods, 
Quantum Inf. and Comp., vol. 11, pp. 784-812, arXiv:0911.1624. 

[9] M. Van den Nest (2012), Efficient classical simulations of quantum Fourier transforms 
and normalizer circuits over Abelian groups, arXiv:1201.4867. 

[10] S. Aaronson and A. Arkhipov (2011), The computational complexity of linear optics in 
Proceedings of the 43rd annual ACM symposium on Theory of computing, pp. 333-342, 
ACM, arXiv:1011.3245. 

[11] M. J. Bremner, R. Jozsa, and D. J. Shepherd (2011), Classical simulation of commuting 
quantum computations implies collapse of the polynomial hierarchy, Proc. R. Soc. A, 
vol. 467, pp. 459-472, arXiv:1005.1407. 

[12] S. Jordan (2010), Permutational quantum computing, Quantum Inf. and Comp., vol. 
10, no. 5, pp. 470-497, arXiv:0906.2508. 

[13] H. Briegel and R. Raussendorf (2001), Persistent entanglement in arrays of interacting 
particles, Physical Review Letters, vol. 86, no. 5, pp. 910-913, quant-ph/0004051. 

[14] S. Bravyi and M. Vyalyi (2005), Commutative version of the k-local Hamiltonian prob- 
lem and common eigenspace problem, Quantum Inf. and Comp., vol. 5, pp. 187-215, 
quant-ph/0308021. 

[15] D. Aharonov and L. Eldar (2011), On the complexity of Commuting Local Hamiltoni- 
ans, and tight conditions for Topological Order in such systems in Foundations of Com- 
puter Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pp. 334-343, IEEE, 
arXiv:1102.0770. 

[16] N. Schuch (2011), Complexity of commuting Hamiltonians on a square lattice of 
qubits, Quantum Information and Computation, vol. 11, no. 11-12, pp. 901-912, 
arXiv:1105.2843. 

[17] M. Hastings (2012), Trivial Low Energy States for Commuting Hamiltonians, and the 
Quantum PCP Conjecture, arXiv:1201.3387. 

[18] D. Shepherd and M. J. Bremner (2009), Temporally unstructured quantum computation, 
Proc. R. Soc. A, vol. 465, pp. 1413-1439, arXiv:0809.0847. 



18 



[19] D. Shepherd (2010), Binary Matroids and Quantum Probability Distributions, 
arXiv: 1005. 1744. 

[20] P. Hayden, D. Leung, P. Shor, and A. Winter (2004), Randomizing quantum states: 
Constructions and applications, Commun. Math. Phys., vol. 250, no. 2, pp. 371-391, 
quant-ph/0307104. 

[21] M. A. Nielsen and I. L. Chuang (2000), Quantum computation and quantum informa- 
tion. Cambridge University Press. 

[22] J. Dehaene and B. De Moor (2003), The Clifford group, stabilizer states, and linear and 
quadratic operations over GF(2), Phys. Rev. A, vol. 68, p. 042318, quant-ph/0304125. 



19 



